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Abstract 

A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of 
life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification 
performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this 
study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning 
method using 900 training data. In addition, the classification performances of the ensemble classifier with naive 
partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the 
stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) 
classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and 
an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by 
overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with 
naive partitioning. This study contributes towards reducing the required amount of training data and achieving better 
classification performance. 
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Introduction 

The P300 is a component of an event-related potential (ERP) in 
a non-invasive scalp electroencephalogram (EEG) that was 
discovered by Sutton et al. [1]. The P300 appears as a positive 
peak approximately 300 milliseconds (ms) after a rare or surprising 
stimulus. The P300 is elicited by the oddball paradigm: rare 
(target) and non-rare (non-target) stimuli are presented to a 
participant, and then he/ she counts the occurrence of the target 
stimuli silently. The P300 can be seen in the ERPs corresponding 
to the target stimuli. Visual and auditory stimuli have often been 
used to elicit the P300 [2,3]. Currently, the P300 is used in brain- 
computer interfaces (BCIs) for controlling devices. 

The P300 was first utilized for spelling out letters by Farwell and 
Donchin in 1988 [4]. They proposed a BCI system that typed 
letters according to the detected P300 elicited by the visual target 
stimuli, referred to as a P300-based BCI or a P300 speller. The 
P300-based BCI can control not only a speller but also a 
wheelchair [5,6], computer-mouse [7], web browser [8], virtual 
reality system [9], game [10], or smart phone [1 1]. Since the BCI 
does not depend on muscle activity, it constitutes a new interface 
that will provide a better quality of life for patients disabled by 
neuromuscular diseases, such as amyotrophic lateral sclerosis 
(ALS) [12]. The interface, classification methods, and their 
extensions have been studied for more than 20 years (e.g., [13- 
15]). 

Stepwise linear discriminant analysis (SWLDA) has been widely 
used as a standard classification algorithm for the P300-based BCI 



[16-19]. Farwell and Donchin first proposed the SWLDA, 
together with the entire classification protocol for P300 [4]. 
Schalk et al. proposed a general-purpose BCI system, named 
BCI2000, in which the P300-based BCI was implemented 
together with the SWLDA [20] . Krusienski et al. compared the 
classification algorithms for BCI [21]. Specifically, they compared 
the classification accuracy of Pearson's correlation method, linear 
discriminant analysis (LDA), SWLDA, linear support vector 
machine (SVM), and Gaussian kernel SVM. The results showed 
that LDA and SWLDA achieved a better performance than the 
others. Blankertz et al. proposed an LDA with shrinkage for P300- 
based BCI that yielded a better performance than SWLDA when 
a small amount of training data were given [22]. 

Ensemble classifiers are among the most powerful classifiers for 
the P300-based BCI; however, they were developed and evaluated 
using a relatively large amount of training data. The ensemble of 
SVMs proposed by Rakotomamonjy and Guigue won the BCI 
competition III data set II that contains a huge amount of training 
data (15300 ERP data) [23]. They applied the ensemble classifiers 
to reduce the influence of signal variability using the classifier 
output averaging technique [24] . Salvaris et al. compared the 
classification accuracies of ensemble LDA and ensemble SVM 
classifiers using the BCI competition III data set II and BCI 
competition II data set lib (7560 training data) [25]. They also 
employed an ensemble of six linear SVM classifiers and evaluated 
classification accuracies using their own data by 16-fold cross- 
validation [26] . An ensemble SWLDA classifier was first proposed 
by Johnson et al. and evaluated on their own P300-based BCI data 
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Figure 1. Experimental design. We analyzed two P300-based BCI data sets A and B respectively. Data set A was recorded in this online 
experiment. The recorded data set A is divided into q pairs of training and test data by p/q cross-validation (see Figure 4). Then the classification is 
performed for all pairs to compute the classification accuracy (see Figure 5). The overlapped partitioning is employed to train ensemble classifiers. 
Data set B (BCI competition III data set II) contains separated training data and test data. The data set was also classified by the proposed classifiers. 
doi:1 0.1 371 /journal.pone.0093045.g001 



(6480 training ERP data) [27]. Arjona et al. evaluated a variety of 
ensemble LDA classifiers using 3024 training data [28]. 

In online (real-time) P300-based BCI experiments, a smaller 
amount of training data compared to the training data used in the 
BCI competition III data set II and BCI competition II data set lib 
tended to be used. Townsend et al. recorded 3230 ERP training 
data for a row-column paradigm and 4560 ERP training data for a 
checkerboard paradigm [15]. Guger et al. evaluated the online 
performances of P300-based BCI, where LDA was trained on 
1125 ERP training data [29]. The EEC data are usuaUy high 
dimensional and the target training data that contain P300 were 
rare (e.g., 1/6) and have different statistical property from the non- 
target data. In other words, researchers must address the class 
imbalance problem [30] that is severely prone to overlitting. Thus 
the thousands of training data can be considered small in this field. 
To be practical, the amount of the training data should be small in 
order to reduce the training time [21]. However, most of the 
studies on the ensemble classifiers for the P300-based BCI did not 
evaluate the classification accuracy using a practical amount of 
training data, e.g., less than 1000 ERP data. 



In an online experiment where less than 1000 training data are 
given, the ensemble classifier may not perform well because of its 
method of partitioning training data. Most ensemble classifiers 
employ naive partitioning that divides training data into partitions 
by sets of data associated with a target command [23]. According 
to the use of the naive partitioning, training data were partitioned 
without overlaps. Johnson et al. also employed the same partition- 
ing method [27]. Due to the naive partitioning method, however, 
each weak learner in the ensemble classifier is trained on a smaller 
amount of training data than a single classifier. In addition, the 
dimension of the EEC data is usually high. In such cases, classifiers 
are prone to overfitting [32]. Thus, the classification performance 
of the ensemble classifiers may deteriorate when the amount of 
training data is small and ensemble classifiers should therefore be 
evaluated when less than 1000 training data are given. 

To develop a better classifier that requires less than 1000 
training data, we propose a new overlapped partitioning method 
to train an ensemble LDA classifier, which we evaluated when 900 
training data were given. The overlapped partitioning allows a 
larger amount of training data to be contained in a partition, 
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Figure 2. Structure of the P300-based BCI system. A target letter is presented to a participant, then letters on the stimulator are intensified by 
row or by column. The participant must do a mental task: silently count when the target letter is intensified. During this, the event-related potentials 
(ERPs) that contain the P300 component are recorded from the scalp. The signals are amplified, digitized, then stored in a computer. After finishing all 
intensifications, the signals were processed to predict a letter, then the feedback is displayed. 
doi:1 0.1 371 /journal.pone.0093045.g002 



although a part of the training data were reused. The proposed 
classifiers were evaluated on our original P300-based BCI data set 
and the BCI competition III data set II, using small (900) training 
data and large (over 8000) training data. One of three conditions 
for dimension reduction was applied: the stepwise method, 
principal component analysis (PCA) or none. Our objective was 
to clarify how the ensemble LDA classifiers with overlapped or 
naive partitioning and the single LDA classifier performed when 
900 training data were given. 

Overlapped partitioning is a new partitioning method that is 
applied in the training of an ensemble classifier, and is designed 
such that it will be suitable for application in P300-based BCI. 
When we evaluated the performance of the new method, we also 



assessed the influences of dimension reduction methods. The 
algorithms were first compared under the condition that 900 
training data were used, which were the smallest amount of data 
used to date for the evaluation of ensemble classifiers for P300- 
based BCI. In addition, the influence of the degree of overlap used 
in the ensemble classifier with overlapped partitioning was 
demonstrated for the first time. We consider that the overlapped 
partitioning is essential to implement the ensemble classifiers in an 
online system. This study contributes towards reducing the 
required amount of training data and achieving better classifica- 
tion performance in an online experiment. 




Figure 3. Stimulator for the P300-based BCI. It has 36 gray letters that form a matrix in the center. Each column of the matrix is numbered 1 -6 
and each row 7-12. A target letter is provided at the top center of the stimulator and the predicted letter is shown at the top right as feedback in test 
sessions. 

doi:1 0.1 371 /journal.pone.0093045.g003 
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Table 1. Parameters of stimulators, data acquisition, and preprocessing methods for data sets A and B. 





Data set A 


Data set B 


#letters 


36 


36 


#row 


6 


6 


#column 


6 


6 


^intensification sequence 


15 


15 


Intensification duration (ms) 


100 


100 


Blank duration (ms) 


75 


75 


Target presentation duration (s) 


3 


2.5 


Feedback presentation duration (s) 


1 


2.5 


#participants 


10 


2 


#recorded letters 


50 


training:85, test:100 


^channels 


8 


64 


Sampling rate (Hz) 


128 


240 


Bandpass filter (Hz) 


0.11-30 


0.1-60 


ERP buffer length (ms) 


700 


700 


Baseline buffer length (ms) 


pre-100 


pre-100 


Moving average (window size) 


3 


18 


Downsampling (Hz) 


43 


20 
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Methods 

Ethics Statement 

This research plan was approved by the Internal Ethics 
Committee at Kyushu Institute of Technology. The possible risks, 
mental task, and approximate measurement time were explained 
to all participants. In addition, all participants gave their written 
informed consent before participating in this experiment. 

Experimental Design 

Ensemble classifiers with the proposed overlapped partitioning 
were evaluated on our original P300-based BCI data set (data set 
A) and BCI competition III data set II (data set B ) as shown in 
Figure 1. The primary objective is to clarify how the overlapped 
partitioning for ensemble classifiers influences the classification 
accuracy. The second objective is to confirm how the three 
conditions for dimension reduction (stepwise method, PCA, or 
none) improve classification performances. 

Data Set A: Our Original P300-based BCI Data Set 

To evaluate ensemble classifiers, we recorded EEC data using 
an online P300-based BCI, and then computed the classification 
accuracy offline. During the EEC recording, visual stimuli were 
provided to the participant. At the same time, the participant 
performed a mental task. The recorded signals were amplified, 
digitized, and then preprocessed before a letter was predicted. Our 
data contains P300-based BCI data from 10 participants that can 
be used for better statistical analysis. Parameters used in the 
stimulus and the recording method of data set A are summarized 
in Table 1. 

Participants. Eleven healthy participants (ten males and one 
female aged 22-28 years old) participated in this study. They had 
no prior experience of controlling P300-based BCI. During the 
experiment, we checked the participants' obtained waveform as 
well as their health status. However, one male participant could 



not complete the task due to sickness. Thus, we finally analyzed 
data from ten participants in offline analysis. 

Devices. The P300-based BCI consisted of a stimulator, 
amplifier, A/D converter, and computer as shown in Figure 2. 
EEC signals were recorded at Fz, Cz, P3, Pz, P4, P07, Oz, and 
P08 scalp sites according to the international 10-20 system, which 
is the alignment commonly used for P300-based BCI [9], The 
ground electrode was located at the AFz site and the reference 
electrodes were located on the mastoids. The EEC signals were 
filtered (0.11-30 Hz band-pass filter) and amplified 25000 times 
with a BA1008 (TEAC Co. Ltd., Japan). Then, the signals were 
digitized by an AIO-163202FX-USB analog I/O unit (CONTEC 
Co. Ltd., Japan). The sampling rate was 128 Hz. The P300-based 
BCI was implemented by MATLAB/Simulink (Mathworks Inc., 
USA). The recorded signals were analyzed offline using MA- 
TLAB. Stimuli for the P300-based BCI were presented on a TFT 
LCD display (HTBTF-24W, 24.6 inches wide with 1920 x 1080 
dpi; Princeton Technology Ltd., Japan) located 60 cm in front of 
the participant. 

Stimuli. We employed most of the parameters of the 
stimulator that were used in the BCI competition III data set II 
[23]. The stimulator of the P300-based BCI consists of 36 gray 
letters that form a 6 x 6 matrix, a target indicator, and a feedback 
indicator (see Figure 3). All the columns and rows of the matrix 
were numbered to manage intensifications and for the subsequent 
prediction of a letter. The set of column numbers was 
C = {1,2,3,4,5,6}, while the set of row numbers was 
R = {7,8,9,10,1 1,12}. In addition, a set of all the intensifications 
was I = C{JR. A row or a column of gray letters in the matrix 
turned white for 100 ms (intensification duration), and then 
changed to gray again for 75 ms (blank duration). At least 
n(I)= 12 intensifications were required to identify an input letter 
out of the 36 letters. This is called a sequence. One row or column 
in a sequence was selected by a random permutation. The number 
of intensification sequences N s was fixed to 15 in the online 
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Figure 4. Procedure of 1/10 cross-validation used for the evaluation on data set A. In this case, p=l and 17=10. ERP data sets 
corresponding to fifty letters inputted by a participant were measured. The square aligned at the top illustrates a data set that contained 180 ERP 
data, 30 of which were labeled as target ERPs, while the others were labeled as non-target ERPs. These data sets were sorted according to measured 
time. The data sets were divided into ten groups. Then, two successive groups were selected. The former group was assigned to training data and the 
latter to test data. Then, each weak learner in the ensemble classifier was learned on the assigned training data and tested using the following test 
data. 

doi:1 0.1 371 /journal.pone.0093045.g004 



experiment (i.e., 180 intensifications), while N s was varied from 1 
to 15 in the offline analysis. 

Preprocessing. EEG data were preprocessed for both online 
recording and offline analysis. The data were trimmed from the 
beginning of each intensification to 700 ms (8 channels x89 
samples). Each 100 ms pre-stimulus baseline was subtracted from 
the corresponding ERP data. Subsequently, ERP data were 
smoothed (using a moving average with a window size of 3 ), 
downsampled to 43 Hz (8 channels x 30 samples), and vectorized 
(240 channels x samples). 

Sessions and a mental task. EEG data of P300-based BCI 
were recorded through a training session and ten test sessions, 
where only the data in the test sessions were evaluated by our 
proposed p/q cross-validation in the offline analysis. In each 
session, a participant was required to spell out five letters using the 
P300-based BCI. A target letter to be inputted was selected 
randomly by the system. Thus the 900 ERPs (5 letters x 1 session 
x 12 intensifications x 15 sequences) were recorded in the 
training session and 9000 ERPs (5 letters x 10 sessions x 12 
intensifications x 15 sequences) for test sessions. A target letter was 
displayed for 3 s, and then intensifications were presented. The 
participant was asked to perform the oddball task to elicit P300: 



the participant had to focus on the cued target letter and count 
silently when the letter was intensified. During the sessions, 
observed EEG data were recorded. In the training session, the 
feedback was not displayed. In the test sessions the feedback was 
shown in the feedback indicator for 1 s at the end of all 15 
intensification sequences for the target letter. The online feedback 
was computed using the single LDA classifiers [21] and was 
presented to the participants in order to confirm whether the 
participant conducted the mental task appropriately in the test 
sessions. The feedback of success or failure also contributes to 
motivate participants [33], even though presenting feedbacks does 
not improve the classification accuracy of P300-based BCI [34]. In 
addition, the feedback is essential for participants to acquire the 
appropriate mental task [35]. Also an experimenter confirmed the 
feedback to make sure that the appropriate classification 
performance were observed using LDA. All the previous data 
gathered before the current session were used for learning the 
classifier in the online recording. 

Data Set B: BCI Competition III Data Set II 

We also evaluated the proposed ensemble classifiers using BCI 
competition III data set II because many novel and traditional 
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Figure 5. Training and testing procedure of the ensemble classifiers for P300-based BCI. Training data flows are represented by blue lines 
and test data flows are illustrated by red lines. The training data are divided into N c overlapped partitions (see Figure 6). One of three conditions for 
dimension reduction (DR) is applied to each partitioned data : the stepwise method, PCA, or none. Then, N c LDA weak learners are trained on these 
dimension-reduced data. The training data are used only for the training of weak learners as illustrated by blue lines. After the training session, the 
test data are processed to compute scores for decision making. 
doi:1 0.1 371 /journal.pone.0093045.g005 



BCI algorithms have been evaluated using this data set. Since the 
competition data set contains a large amount of training data, we 
evaluated the classification performance using limited training 
data (900 ERPs) in addition to the full training data (15300 ERPs). 
Parameters used in the stimulus and data recording of the data set 
B are also summarized in Table 1. 

Overview of the data set and stimulator. The data set 
contains EEG data for participants A and B. The EEG data were 
recorded from 64 channels. The recorded signals were bandpass 
filtered (0.1-60 Hz) and digitized at 240 Hz. The same procedure 
of intensifications and mental tasks for data set A was also applied 
to the data set B. The differences in the stimulators between data 
sets A and B were in the size, the font and the brightness of letters, 
horizontal/vertical distances of letters, and the method of 
presenting the target and feedback letters. It should be noted that 
the target and feedback presentation times were different between 
these two data sets, though these parameters were not directly 
related to the offline analysis. The data set contains EEG data 
corresponding to 85 target letters for training (85 letters x 12 
intensifications x 15 sequences = 15300 ERPs) and EEG data of 
100 target letters for testing (18000 ERPs) for each participant. A 
more detailed description of the data set can be found in [36] . 

Preprocessing. The same preprocessing method was used 
for data sets A and B; however different parameters were 
employed because the sampling rate and the number of channels 



for data set B were larger than those for data set A. All 64 channels 
data were used for the offline analysis. The data were trimmed 
from the beginning of each intensification to 700 ms (64 channels 
x 168 samples). Each 100 ms pre-stimulus baseline was subtracted 
from the ERP data. ERP data were smoothed (using moving 
average with window size =18), downsampled to 20 Hz (64 
channels x 14 samples), and vectorized (896 channels xsamples). 
The vectorized data are handled as feature vectors in the 
classification. 

Ensemble classifiers with overlapped partitioning 

The ensemble classifier divides given training data into 
partitions, then those partitions were used to train multiple 
classifiers in the ensemble classifier. The classifier in the ensemble 
classifier is called a "weak learner." The number of weak learners 
is denoted by N c . The training data were divided into N c 
partitions using overlapped partitioning. A dimension reduction 
method was applied to these partitioned data, and then N c LDA 
weak learners were trained. The test data corresponding to a letter 
were processed to compute the scores, and then the scores were 
translated into a predicted letter. To evaluate the classification 
performance using thousands of training data, the proposed p/q 
cross-validation was applied. 

p/q cross-validation, p/q cross-validation is a special cross- 
validation that can reduce the amount of training data. For a fair 
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Figure 6. Overlapped partitioning when N c =5 and Nb = 3. Training data were first divided into five blocks. Assuming that those five blocks 
were aligned around a circle, three continuous blocks were selected to form a partition. As a result, five partitions were prepared. The partitioned 
training data sets were used to train weak learners in the ensemble classifier. 
doi:1 0.1 371 /journal.pone.0093045.g006 



comparison of the classification accuracy, the amount of training 
data used in the offline analysis should be reduced to less than 
1000. The traditional cross-validation method is not suitable 
because it provides at least 4500 training data in this case. Instead, 
we employed the proposed p/q cross-validation that performed 
q-io\A cross-validation, where p/q of all data were assigned to the 
training data. 

First, the ERP training data are divided into q groups. Second, 
assuming that the groups are aligned around a circle, p + l groups 
from wth group (ue{ 1,2,. ..,<?}) are sequentially selected. Then, p 
consecutive groups are assigned to the training data, and the last 
single group was assigned to the test data. The above procedures 
are repeated for all u. In total, q pairs of training and test data are 
prepared. For each pair, classification is performed. The 
classification accuracy can be computed as ^correct letter/ 
#total letter, where #total letter is the total number of letters 
and ^correct letter is the number of correct predictions among all 
pairs. It should be noted that (q—l)/q cross-validation is 
equivalent to the conventional (/-fold cross-validation. 

In the present study, we have evaluated data set A using the 
1/10 cross-validation as shown in Figure 4. In other words, five 
letters out of 50 were assigned to the training data, which 
contained 900 ERPs (9000 ERPs x 1/10). It takes 180.125 
seconds to spell out five letters in the conditions of this online 
experiment, which does not overly tire the participant. In addition 
to the 1/10 cross-validation, we also used the conventional 10-fold 



cross-validation (9/10 cross-validation) in order to compare the 
ensemble classifiers when a large amount of training data were 
provided. Thus, ERPs for 45 letters out of 50 were used as training 
data, which contained 8100 ERPs (9000 ERPs x 9/10). The p/q 
cross-validation was not applied to data set B because the 
competition data set has separated training and test data. 

Overlapped partitioning. In a BCI study on ensemble 
classifiers, naive partitioning was used [23]. According to their use 
of naive partitioning, the given training data were divided into 
partitions by letters without overlaps. Due to the partitioning 
without overlaps, the amount of training data in a partition 
becomes small so that covariance matrices might not be estimated 
precisely. Instead of this method, we proposed a generalized 
partitioning method. 

All the procedures for training and testing the proposed 
ensemble classifier for P300-based BCI are shown in Figure 5. 
In overlapped partitioning, sets of training data are divided into N c 
partitions, where the overlap of each partition is allowed. In the 
first step of the overlapped partitioning method, training data 
assigned to input commands were sorted by recorded time and 
were divided into N c blocks without overlaps. Then, assuming that 
the blocks were aligned around a circle, Ny consecutive blocks 
from vth block (ve{l,2,...,N c }) were selected to form a partition. 
The procedure was repeated for all v. An example of overlapped 
partitioning is shown in Figure 6. Each weak learner was trained 
on the partitioned data (see Figure 5). The advantage of this 
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partitioning method as compared to naive partitioning is that a 
larger amount of data are stored in each partition. Thus, 
overlapped partitioning may be robust against shortage of training 
data. In the present study, N c was fixed; however, Nb was varied in 
the offline analysis. 

An ensemble classifier with the overlapped partitioning can be 
considered as a special case of bagging used in pattern recognition 
[37]. In the bagging, random sampling from available training 
data allowing overlap is used, which is also referred to as bootstrap 
sampling. In contrast, the overlapped partitioning does not have 
any randomness so that no duplicated partition is made except for 
a special case. Unlike a standard pattern recognition problem, a 
set of EEG was recorded for every letter, where 30 ERPs 
contained P300 and the other 150 ERPs did not. The random 
sampling out of the full set of EEG data runs the risk that only a 
few ERP data that contain P300 could be selected in a partition, 
which may deteriorate classification performance. Also the 
random sampling out of five blocks of EEG data is not effective 
because duplicated partitions could be prepared. The proposed 
overlapped partitioning does not have such risks and provides 
different partitions with a constant ratio of EEG data with P300 to 
those without it. Thus, the weak learners of the ensemble classifier 
can efficiently be trained by the overlapped partitioning. 



Dimension reduction. A dimension reduction method has 
often been applied to the BCI because EEG data are usually high 
dimensional. However, the influences of the dimension reduction 
methods have not been evaluated for ensemble classifiers. In this 
study, one of three conditions for dimension reduction was 
applied: 2 dimension reduction methods (the stepwise method and 
PCA) and a control condition without dimension reduction (none). 

• Stepwise method The stepwise method selects suitable 
spatiotemporal predictor variables for classification by forward 
and backward steps. First, an empty linear regression model is 
prepared, then variables are appended through the following 
steps. In the forward step, a variable is appended to the model, 
then the model was evaluated by an F-test. Through the F-test, 
p-value was computed, which is the probability of the 
occurrence of a result by chance. The variable is added if 
the p-value of the F-test is higher than a threshold pi„. The 
forward step is repeated until no variable is appended. In the 
following backward step, a variable of the temporal model is 
removed and the model was also evaluated by the F-test. Then, 
the variable is removed if the p-value of the F-test is lower than 
a threshold p ou t- The backward step is continued until no 
variable is removed. Then, the forward step is repeated again. 
The final model is determined when no variable is appended 
to or removed from the model. The remaining variables in the 
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final model are used for classification. More details are given in 
[21,38]. We set pi„ = 0A and p„ M = 0.l5 ! which were 
commonly used for P300-based BCI [21,22]. 

• Principal component analysis The principal component 
analysis (PCA) is a typical dimension reduction method which 
is based on the eigenvalue decomposition [39], and has also 
been applied to P300-based BCI [10,40]. In summary, the 
covariance matrix of training data is computed and then the 
eigenvalue decomposition is performed. The projected data 
using a normalized eigenvector corresponding to the largest 
eigenvalue is called the first principal component (PC). The 
other PCs can be calculated as well. We applied PCA to data 
in each partition, and then used 1—140 PCs for classification on 
data set A, 1-400 PCs for classification on data set B. 

Linear discriminant analysis. Linear discriminant analysis 
(LDA) is a frequently used classifier for P300-based BCI. In the 
ensemble classifier, N c LDA weak learners are implemented. One 
of three conditions for dimension reduction is applied to the fcth 
partitioned data, and then the weight vector of the fcth LDA weak 
learner is trained as follows: 



w * = E* 1 C«M -Kk,i)MW>-,Nc}, (1) 

where is a total covariance matrix over the target and non- 
target training data, and fi^ 2 and fi^ [ are the mean vectors of the 
target and non-target training data in the kth partition. The 
trained weight vectors of each LDA weak learner are used to 
compute the score for the decision making. See [22] for more 
details of a single LDA classifier. 



Decision making. To predict a letter, its corresponding test 
data were processed to compute scores for decision making. A test 
feature vector that belonged to the ith intensification in the y'th 
sequence in the kth partition after applying dimension reduction 
was denoted by %ijjc,ieIJe{l,2,...,N s }. The score Sj corresponding 
to an intensification was computed as. 

j=\ k=\ 

In the offline analysis, the number of intensification sequences 
N s was varied from 1 to 15. The inputted letters were then 
predicted by finding maximum scores from row and column 
intensifications, respectively: 

d= arg max{s g }, arg max{s h } . (3) 
V ?eC hex ) 

The first element of d represents the column number of a 
predicted letter, while the second represents the row number. For 
example, d = (2,9) denotes "N" in Figure 3. 

Special cases of overlapped partitioning. The ensemble 
classifiers with the proposed overlapped partitioning are equivalent 
to ensemble classifiers with naive partitioning or a single classifier 
in a special case. That is, the ensemble classifier with overlapped 
partitioning becomes the ensemble classifier with naive partition- 
ing when N[, = 1 and N c > 1 . In this case, partitions do not overlap 
each other, which can be easily seen in Figure 6. Moreover, the 
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Table 2. Evaluation parameters of ensemble classifiers with overlapped partitioning on data set A. 



Evaluation ^training data for a 



method 


#trainlng letters 


#test letters 


N c 


Nt 


weak learner (ERPs) 


1/10 cross-valdiation 


5 letters 


50 letters 


5 


1 


180 




(900 ERPs) 




5 


2 


360 








5 


3 


540 








5 


4 


720 








5 


5 


900 


9/10 cross-valdiation 


45 letters 


50 letters 


45 


1 


180 


(conventional 


(8100 ERPs) 




45 


5 


900 


10-fold cross-validation) 






45 


10 


1800 








45 


15 


2700 








45 


20 


3600 








45 


25 


4500 








45 


30 


5400 








45 


35 


6300 








45 


40 


7200 








45 


45 


8100 



The data set A was evaluated by 1/10 cross-validation (900 training data ) and 9/10 cross-validation (8100 training data). The number of weak learners N c and the 
number of blocks Nb were the parameters of the overlapped partitioning. These evaluation methods and parameters determine the amount of training data for a weak 
learner in an ensemble classifier. The number of training letters (^training letters) is decided by 50 entire letters x pjq. The number of training data for a weak learner 
(^training data for a weak learner) can be computed by 9000 entire ERPs x pjq x N/,/N c . 
doi:1 0.1 371 /joumal.pone.0093045.t002 



ensemble classifier also behaves as a single classifier when Nt — N c . 
The scores in Equation 3 can be multiplied by an arbitrary 

K>0: 



7=1 



iel. 



(7) 



d= arg max {.?,,}, arg max{s/,} 

V geC heR 



= arg max{y& J? }, arg max{i&/i} ) . (4) 

When Ni, = N c , all the partitioned data sets are just duplications 
of all the given training data. After a dimension reduction method 
has been applied, the same data are stored in all partitions. As a 
result, all the weight vectors of the classifiers become the same: 



w = W] =w 2 = ... = Wjv c . (5) 

Since the final model of the stepwise method or the projection of 
the PCA is adjusted by the same training data, the test data after 
the dimension reduction should be the same: 



X,v = Xy, 1 = Xy ;2 = . . . = Xy,AT c , (6) 

Considering Equation 5 and 6, the score for decision making 
instead of Equation 2 is computed by 



On the other hand, the score for a single classifier is formed as 
.<'=f>x iV , (8) 

7=1 

Thus, the relationship between the single classifier and the 
overlapped ensemble classifiers that have Nt = N c is 

/,=a#. (9) 

From Equation 4, .?J and s" work in the same way for decision 
making. Therefore, the ensemble classifier with overlapped 
partitioning that satisfies Nb = N c corresponds to a single classifier. 

Comparison Protocol 

We evaluated varieties of ensemble classifiers with overlapped 
partitioning in order to ensure the influence of the degree of 
overlap together with dimension reduction methods. One of three 
different conditions for dimension reduction was applied: stepwise, 
PCA, or none. They are denoted by overlapped ensemble 
SWLDA (OSWLDA), overlapped ensemble PCA LDA (OP- 
CALDA), and overlapped ensemble LDA (OLDA) classifiers, 
respectively. 

Those 3 classifiers were evaluated on data sets A and B. Data set 
A, recorded by us, was analyzed in the small training data case 
using 1/10 cross-validation and in the large training data case 
using 9/10 cross-validation (conventional 10-fold cross-validation). 
Thus, the same amount of the training data was provided for each 
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Table 3. Evaluation parameters of ensemble classifiers with overlapped partitioning on data set B (BCI competition III data set II). 





Evaluation 






^training data for a 


method ^training letters ^test letters 






weak learner (ERPs) 


timited training data 5 letters (900 ERPs) 100 letters 


5 


1 


180 


(first 5 letters) 


5 


2 


360 




5 


3 


540 




5 


4 


720 




5 


5 


900 


Full training data 85 letters 100 letters 


17 


1 


900 




17 


2 


1800 




17 


3 


2700 




17 


4 


3600 




17 


5 


4500 




17 


6 


5400 




17 


7 


6300 




17 


8 


7200 




17 


9 


8100 




17 


10 


9000 




17 


11 


9900 




17 


12 


10800 




17 


13 


11700 




17 


14 


12600 




17 


15 


13500 




17 


16 


14400 




17 


17 


15300 


The ensemble classifiers were trained on limited training data (900 training data ) 


or full training data (15300 training data). The number of weak learners N c and the 


number of blocks Nf, were parameters used in the overlapped partitioning. These evaluation methods and the parameters determine the amount of training data for a 


weak learner in an ensemble classifier. The number of training data for a weak learner (#training data for a weak learner) 


can be computed by given training ERPs x Nb/ 



N e . 

doi:1 0.1 371 /journal.pone.0093045.t003 

ensemble classifier 900 training data for the former and 8100 
training data for the latter. Additionally, in the cross-validation, 
the training and test data were clearly separated so that none of 
the training data were used as the test data. Data set B (BCI 
competition III data set II) was also analyzed using limited training 
data (ERPs corresponding to the first 5 letters) and using full 
training data (ERPs corresponding to 85 letters). The former 
contained 900 ERPs while the latter contained 15300 ERPs for 
training. 

To confirm the influence of overlapped partitioning, the degree 
of overlaps N/, was varied, while the number of weak learners N c 
was fixed in the offline analysis. Evaluated combinations of N c and 
N/, for data sets A and B were summarized in Tables 2 and 3, 
respectively. In particular, in the case N/,=N C , the ensemble 
classifier with overlapped partitioning is equivalent to the single 
classifier. In addition, in the case where Nb = 1 and N c > 1 , it 
behaves as a conventional ensemble classifier with naive 
partitioning. It should be noted that the algorithms were learned 
on 900 training data of both data sets, which was much smaller 
than the training data used in previous studies, for example, the 
15300 training data used in the BCI competition III data set II 
[23], and 7560 data used in the BCI competition II data set lib 
[41]. In our comparison, the single SWLDA which is commonly 
used in this field and the ensemble SWLDA proposed by Johnson 
et al. were compared. 

For the statistical analysis of data set A, the effects of the 
intensification sequence (N s = 1,...,15), dimension reduction con- 
dition (stepwise, PCA or none), and degree of overlaps 
(Nb = 1,— ,5) were evaluated by three-way repeated-measures 
ANOVA followed by post hoc pairwise t-tests with Bonferroni's 
method. No statistical analysis was applied to data set B because of 
the limited number of participants. 



Results 

The classification performances of OSWLDA, OPC ALDA, and 
OLDA were evaluated on data set A using 1/10 or 9/10 cross- 
validation and data set B with limited or full training data. The 
degree of overlap used in the overlapped partitioning (Nb) was 
varied while the number of weak learners in the ensemble classifier 
(N c ) was fixed. As mentioned above, an overlapped ensemble 
classifier behaves as an ensemble classifier with naive partitioning 
when N c > 1 and Nb = 1 , and becomes a single classifier when 
N c = N b . 

Data Set A Using 1/10 Cross-validation 

EEC data in data set A were classified by OSWLDA, 
OPCALDA, and OLDA using 1/10 cross-validation using 
parameters in Table 2. The classification performances of these 
classifiers for each participant are shown in Figure 7. The mean 
accuracies of these algorithms are shown in Figure 8 and in 
Table 4. 

The key finding was that OSWLDA showed higher classifica- 
tion performance than the single SWLDA classifier (Nb = 5) and 
ensemble SWLDA classifier with naive partitioning (Nf, = 1) when 
900 training data were provided. As can be seen in Table 4, most 
algorithms achieved the best performance when Nb = 4, while the 
worst accuracy was observed when Nb = 1 . Regarding OLDA, 
when Nb = 1 , the classification accuracy was close to the chance 
level (1/36). As can be seen in Figure 8, OSWLDA (Ni, = 4) 
achieved a higher classification accuracy than the single SWLDA 
classifier (N h = 5), especiaUy in 3<N S <1. At N s = 5, OSWLDA 
(Nb=4) obtained an 11.2% higher accuracy than the ensemble 
SWLDA classifier with naive partitioning and a 4.8% higher 
accuracy than the single SWLDA classifier. Moreover, OP- 
CALDA (Nb = 4) achieved a better classification accuracies than 
OPCALDA (Nb = 5) when 6<A S <9, although the differences 
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Figure 9. Classification performances of ensemble classifiers on data set A using 9/10 cross-validation. OSWLDA, OPCALDA and OLDA 
were trained on 8100 ERPs. Then the data set A was classified by those classifiers, changing N„ and N/,. The classification performances of all 
participants were displayed. 
doi:1 0.1 371 /journal.pone.0093045.g009 



were small. In contrast, the accuracy of OLDA (A 7 /, = 4) was close 
to that of the single LDA classifier (Nb = 5), although OLDA 
(Nb = 4) achieved slightly higher accuracies in some sequences. 

A three-way repeated-measures ANOVA with the intensifica- 
tion sequence, dimension reduction conditions, and degree of 
overlap was applied. The main effects of the intensification 
sequence (,F(14,126) = 166.6, /><0.01), dimension reduction 
conditions (_F(2,18) = 614.6, p<0.0l), degree of overlap 
7(4,36)= 1356, p<0M) and aU their interactions (p<0M for 
all) were significant. In addition, significant differences between 
the dimension reduction conditions (p<0.01 for all), and between 
pairs of N/,, except for the pair N/, = 3 and 5 (p<0.0\ for all), were 
revealed by the post hoc pairwise t-test with Bonferroni's method. 

Data Set A Using 9/10 Cross-validation 

LEG data in data set A were also classified by the three 
algorithms using 9/10 cross-validation using parameters in 
Table 2. Classification performances of the three algorithms for 
each individual participant are shown in Figure 9. The mean 
classification performances are shown in Figure 10 and Table 5. 

The classification performance of ensemble classifiers with the 
overlapped partitioning were as well as, or slightly better than that 
of the single classifier when 8 1 00 training data were provided. As 
shown in Figure 10, the worst classification performance was 
achieved by the ensemble classifiers (N c = 45,N[, = 1) for all 



algorithms, which was the same as the analysis of data set A 
using 1/10 cross-validation. However, only a little performance 
improvement of the overlapped ensemble classifiers can be found 
when compared to the single classifier (N c = 45, N& =45). 

A three-way repeated-measures ANOVA with the intensifica- 
tion sequence, dimension reduction conditions, and degree of 
overlap was applied. The main effects of the intensification 
sequence 7(14,126)= 135, p<0.0\), dimension reduction 
conditions (F(2,18) = 510.9, /)<0.01), degree of overlap 
(^(9,81)= 197.9, p<0. 01) and all their interactions (p<0.01 for 
all) were significant. In addition the post hoc pairwise t-test was 
applied. Significant differences between the dimension reduction 
conditions (p<0.05 for all) were revealed. Also, significant 
differences between the pairs containing N/, = l, A 7 j = 5, 
Nb = 10, and Ni, = 15 (p<0.01 for all) were revealed. 

Data Set B with Limited Training Data 

EEG data in data set B were classified by OSWLDA, 
OPCALDA and OLDA using 900 training data using parameters 
in Table 3. Classification performances of OSWLDA, OP- 
CALDA, and OLDA evaluated on data set B using a limited 
amount of training data (900 ERPs) are shown in Tables 6, 7, and 
8, respectively. 

The OSWLDA and OPCALDA (N c = 5 and N h = 3,4) achieved 
better classification accuracies than those with naive partitioning 
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Figure 10. Mean classification performances of ensemble classifiers on data set A using 9/10 cross-validation. OSWLDA, OPCALDA and 
OLDA were trained on 8100 ERPs. The mean classification accuracies over ten participants were presented. 
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(N c = 5 and N/, = l) and the single classifiers (N c = 5 and Nb = 5) 
when 900 training data were available. As for OSWLDA, the best 
classification accuracies can be seen when N/, =4. Further, most of 
the best mean classification performances of OPCALDA can be 
seen when Nh = 3 or Nb = 4. These tendencies are similar to the 
analysis of data set A using 1/10 cross-validation. OSWLDA 
achieved about 10% (15% at best when N s = 6) higher mean 
classification accuracy than the single SWLDA classifier 
(N c = 5,Nb = 5). OPCALDA also achieved a 5.5% higher mean 
classification accuracy than the single PCALDA classifier 
(N c = 5,Nb = 5) when A, = 12. However, all of the classification 
performances of OLDA were close to chance level. 

Data Set B with Full Training Data 

EEG data of data set B were classified by OSWLDA, 
OPCALDA and OLDA using 15300 training data using 
parameters in Table 3. Classification performances of the three 
algorithms evaluated on data set B using full training data (15300 
ERPs) are presented in Tables 9, 10, and 11, respectively. 

The classification performances of ensemble classifiers with the 
overlapped partitioning (OSWLDA, OPCALDA and OLDA, 
N c = 17, 1 < Nb < 15) were as well as, or slightly better than those 
with naive partitioning (,A t . = 15 and Nb = Y) and those single 
classifier (N c = \5 and iV^ =15) when 15300 training data were 
available in most sequences. The best classification performance 
was achieved by OSWLDA; 98% when A s =15, N c = \l, 
N b = 7,9,11. In other words, OSWLDA achieved a 1.5% higher 
classification performance than the ensemble of SVMs achieved by 
the winner of BCI competition III data set II [23]. OSWLDA 
achieved about 3% improvement over single SWLDA (N c = \l, 
Nb = 17). However little improvement by the ensemble classifier 
with the overlapped partitioning can be seen compared to the 
single classifier, just as the analysis of data set A using 9/10 cross- 
validation. 



Discussion 

In order to ensure the influence of the overlapped partitioning 
compared to traditional naive partitioning and a single classifier, 
classification accuracies of ensemble classifiers with those parti- 
tioning methods were compared when 900 training data were 
given. Two different P300-based BCI data sets were evaluated; 
data set A with 1/10 cross-validation and data set B using limited 
training data. The single classifier (N c = Nb ) and the traditional 
ensemble classifier with naive partitioning (N c > 1 and Nb = 1) 
were also compared at the same time. One of three conditions for 
dimension reduction methods (stepwise, PCA, and none ) was also 
applied. The results show that OSWLDA trained on 900 ERPs 
achieved higher classification accuracy than the single SWLDA 
classifier (N c = 5, Nb = 5) and the ensemble SWLDA classifier with 
naive partitioning (N c = 5, Nb = 1) for both data sets (see Tables 4 
and 6). More specifically, the proposed OSWLDA learned on 900 
ERPs achieved a 4.8% higher accuracy than the single SWLDA 
for data set A (A,, = 5, N b = 4, N, = 5) and 15% higher than the 
single SWLDA for data set B (N c = 5, Nb=4, N s = 6), where the 
single SWLDA is an established and commonly used classification 
algorithm for P300-based BCI. 

The performance improvement of proposed classifiers trained 
on 900 ERPs was due to the mutual effect of the overlapped 
partitioning and the dimension reduction. In the statistical analysis 
of data set A using 1/10 cross-validation, the main effects of the 
intensification sequence, degree of overlap (Nb), dimension 
reduction conditions, and their interactions were significant. 
According to the results shown in Figure 8 (c), indeed, the 
overlapped ensemble LDA classifier without dimension reduction 
(OLDA) did not achieve higher classification accuracies than a 
single LDA classifier (Nb = 5) in many cases. Applying a dimension 
reduction method in itself is a solution to improve the classification 
performance of the ensemble classifier with naive partitioning. 
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Table 9. Classification accuracies (%) of OSWLDA on data set B with full training data. 



N c N b Participants Intensification sequences N, 
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Table 9. Cont. 



N c N b Participants Intensification sequences N, 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

17 17 A 21 32 51 51 60 65 68 76 79 86 85 89 94 93 94 

B 42 62 69 70 82 84 88 91 92 95 94 94 94 94 97 

Mean 31.5 47.0 60.0 605 71.0 74.5 78.0 83.5 85.5 90.5 89.5 91.5 94.0 93J) 95.5 

The best mean accuracy among all N/, for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble 

classifier with naive partitioning when N c = \l and N/, = l. The classifier is equivalent to a single classifier when N c = \l and Nt = 17. 
doi:1 0.1 371 /journal.pone.0093045.t009 



However, as shown in Figures 8 (a) and (b), when Aft = 1 , the 
dimension reduction method alone did not improve the classifi- 
cation accuracy as compared to their single classifiers. On the 
other hand, as also shown in Figures 8 (a) and (b), the overlapped 
ensemble LDA classifier together with the stepwise method 
(OSWLDA, Ni, = 4) or PC A (OPCALDA, A,, =4) achieved higher 
classification accuracy than their single classifiers (Aft = 5). This 
tendency was obvious, especially for OSWLDA. Thus, the 
improvement in the classification accuracy was due not only to 
the dimension reduction or partitioning method by themselves but 
also to their mutual effects. Taking this into consideration, the 
overlapped partitioning method, together with the dimension 
reduction method, effectively improved the classification perfor- 
mance of P300-based BCI. 

The performance improvement of the proposed classifiers 
compared to the single classifier was small when a large amount 
of training data were provided. However, the classification 
performances of proposed classifiers trained on a large amount 
of data were high enough to achieve 99.6% for data set A (see 
Table 5) and 98% for data set B (see Table 9). In those cases, 
however, a major performance improvement caused by over- 
lapped partitioning was not confirmed. This was because the given 
training data were large enough so that the overrating problem 
should not occur in most cases. Thus the advantage of overlapped 
partitioning can be seen when a small amount of high-dimensional 
training data were provided such as for the analysis of data set A 
using 1/10 cross-validation and the data set B with limited training 
data. 

We suggest to use the conventional cross-validation to find the 
optimal overlapping ratio N/,/N c before an online experiment. 
However the method prolongs the training time for the classifier. 
Instead of that, we also suggest to use Aft/A c «0.8 (e.g., Aa=4 
and N c = 5) because it showed suboptimal results for both data 
sets. In the small training data case (900 ERP data), OSWLDA 
and OPCALDA with N b /N c = 0.& (N h = 4 and N c = 5) was 
suboptimal for both data sets A and B, but OLDA with 
Aft/A f =0.8 performed as well only for data set A. In the large 
training data case, OSWLDA, OPCALDA and OLDA with 
Aft/A f =0.78 (A;, = 35 and N c = 45) evaluated on data set A and 
with Nb/N e = 0.82 (Aft = 14 and N c = ll) evaluated on data set B 
achieved reasonable classification accuracies. In this way, the 
overlapping ratio A/,/A c «0.8 was suboptimal and it can be 
employed to avoid using the cross-validation. 

This study first showed that the ensemble LDA classifiers with 
conventional naive partitioning were not effective compared to 
the single LDA classifier and the ensemble classifier with 
overlapped partitioning when 900 training data were given. This 
result implies that the ensemble LDA classifier with naive 
partitioning requires a longer training session to obtain more 
than 900 training data before an online experiment. It should be 



noted that 900 training data were the smallest used for the 
evaluation of the ensemble classifier to date. In contrast, the 
ensemble classifiers with the proposed overlapped partitioning 
method showed a significant improvement in the classification 
accuracy, which was even better than a single classifier when the 
stepwise method or PCA was applied for dimension reduction. 
Thus, overlapped partitioning was shown to be more practical 
than naive partitioning when the given training data were small 
(e.g., 900 training data). 

The performance deterioration of the ensemble LDA classifiers 
with naive partitioning may be due to the poor estimation of the 
covariance matrices of LDA weak learners. Such performance 
deterioration can be seen in the results of OLDA on data set A 
using 1/10 cross-validation (N c = 5, Aft = 1), OLDA on data set A 
using 9/10 cross-validation, OSWLDA and OPCALDA on data 
set B with limited training data (N c = 5, Aft = 1,2), OLDA on data 
set B with limited training data, and OLDA on data set B with full 
training data (A c = 17, A/, = l). The problem can be seen when 
Nb = 1,2 because a small amount of training data were provided to 
the weak learners (see Tables 2 and 3). Regarding data set B, 900 
training data were not sufficient to train weak learners of OLDA 
(N c = 5, Aft < 5 with limited training data and N c = 17, A/, = 1 with 
full training data). Compared to data set A, data set B seems to 
require larger training data because the EEC data of data set B 
were higher dimensional (896 dimension). Estimated covariance 
matrices are imprecise when a small amount of high dimension 
training data are given [22]. Johnson and Krusienski first 
evaluated the classification performance of the ensemble SWLDA 
classifier with naive partitioning [27]. They evaluated the 
algorithm by changing the number of classifiers (N c was changed 
while Aft was fixed to 1). In addition, three weighting methods for 
the ensemble classifier were evaluated. As a result, they found 
that the ensemble SWLDA classifier showed better performance 
than the single SWLDA classifier, depending on participants, 
though the statistical difference was not revealed. They also 
discussed that the classification performance was decreased when 
N c >6 and Aft = 1 because the amount of training data for a weak 
learner becomes small. We consider that a similar problem arose 
in the application of the ensemble classifier with overlapped 
partitioning when N c = 5 and Aft = 1 , which is similar to their 
conditions. Such a problem can be avoided by applying the 
overlapped partitioning together with a dimension reduction 
method. 

The ensemble classifiers with overlapped partitioning trained on 
900 ERPs showed better classification performances than a single 
classifier in the middle intensification sequence condition in the 
offline analysis. According to Figure 8 (a), OSWLDA (A/, =4) 
achieved higher classification accuracy than the single SWLDA 
classifier (A/, = 5 ) among 3<A t <5. In contrast, the OPCALDA 
(Aft = 4) showed higher classification accuracy than the single 
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Table 10. Classification accuracies (%) of OPCALDA on data set B with full training data. 
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Table 10. Cont. 



N c N t Participants Intensification sequences N, 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

17 17 A 18 34 49 56 56 63 76 78 79 84 84 91 92 92 95 

B 46 64 62 73 81 90 91 90 90 93 93 93 93 95 97 

Mean 32.0 49.0 55.5 64.5 685 76.5 83.5 84.0 84.5 885 885 9Z0 92.5 935 96.0 

The best mean accuracy among all Ni, for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble 

classifier with naive partitioning when N c = \l and A^ = l. The classifier is equivalent to a single classifier when N c = \l and Nb = 17. 
doi:1 0.1 371 /journal.pone.0093045.t01 0 



PCA LDA classifier (N b = 5) when 6<N S <9. This result implies 
that the ensemble classifier with overlapped partitioning was 
beneficial in the middle number of the intensification sequence. N s 
decides the terms to compute the score for decision making 
according to Equation 2. The performance saturation can be seen 
as the N s become larger while the classification performance was 
not precise when N s was smaller. In both cases, differences of 
those classification performances were hard to confirm. This might 
explain why the classification performance difference was obvious 
in the middle number of sequences. 

The selection of the number of the intensification sequence in 
an online P300-based BCI experiment depends on the applications 
of the BCI system. One criterion is the information transfer rate 
(ITR), which takes the accuracy, number of outputs, and output 
time (the number of sequences) into consideration [35]. OSWLDA 
on data set A using 1/10 cross-validation (Aft =4) achieved the 
highest ITR (15.7 bits per minute) at A J = 3, although only a 
71.4% accuracy was expected in an online experiment. On the 
other hand, accuracy must be prioritized, for example, when the 
BCI is used to provide precise control of a robotic manipulator 
that could be dangerous. To decide parameters such as the 
number of intensification sequences, we should consider what kind 
of criterion (accuracy, speed, or ITR) should be optimized in terms 
of BCI applications. 

Determining the amount of training data also decides an 
expected online classification accuracy. If the system needs over 
70% mean classification accuracy, only 900 training data are 
required. In case that over 95% mean accuracy is required, a large 
amount of training data should be prepared. Most BCI 
applications do not usually require over 95% classification 
accuracy because they are free from danger. Thus 900 training 
data are sufficient to achieve over 70% mean accuracy for most 
applications of BCI. 

We would like to emphasize that the ensemble classifiers with 
overlapped partitioning required less training data than that with 
naive partitioning. OSWLDA and OPCALDA performs better 
than the ensemble classifier with naive partitioning enough to 
achieve over 90% classification accuracy using only 900 training 
data. Especially the mean classification accuracy of OSWLDA 
(N c = 5,N/,=4) with the small training data achieved as well as 
that of ensemble SWLDA with naive partitioning (N c = 5,Nb = 1) 
for data set A. In this way the ensemble classifier with overlapped 
partitioning require less training samples than that with naive 
partitioning so that it might be useful to do away with expensive 
experiments. 

In this research, the PCA and stepwise method were applied as 
a dimension reduction. The PCA and the stepwise method have 
different statistical properties; PCA finds the projection that 
maximizes the data variance while the stepwise method selects 
spatiotemporal variables. Although no great difference was found 



in the classification accuracy for data set A using 1/10 and 9/10 
cross-validation and data set B with full training data, OSWLDA 
showed better performance than OPCALDA for data set B with 
limited training data. In this way, the stepwise method was robust 
for both P300-based BCI data sets. The difference between the 
two also appears in the online/ offline test computational cost; the 
stepwise method requires a smaller processing burden than PCA 
because the stepwise method in the test case does not use data 
projection. The difference will be more obvious when N c 
becomes large. Considering the computational cost, the stepwise 
method is preferable in case a large number of classifiers are 
required. 

In future research, LDA with shrinkage [22] or Bayesian LDA 
[32] will be applied to the ensemble classifier with overlapped 
partitioning. These two methods estimate covariance matrices in 
different ways so that LDA in itself becomes robust against a lack 
of training data. Thus, it may be possible to achieve better 
classification accuracy with a smaller amount of training data by 
applying the two methods. The proposed ensemble classifiers with 
overlapped partitioning may be applicable to other types of BCIs 
such as an event-related desynchronization/synchronization 
(ERD/ERS)-based BCI [42]. In fact, some ensemble classifiers 
for ERD/ERS-based BCIs were evaluated [43] and our proposed 
overlapped ensemble classifiers might also be applicable. More- 
over, the ensemble classifier with the overlapped partitioning can 
be used in other pattern recognition problems, e.g., a cancer 
classification [44] or fMRI data analysis [45]. Furthermore, 
clustering algorithms such as k-means clustering [46] could be 
used for a new overlapped partitioning of the ensemble classifiers. 
By clustering the data with overlaps, classifiers that perform well 
for specific features can be trained. Thus, the clustered 
partitioning with overlaps may show an even better classification 
performance. 

Conclusion 

In this study, ensemble LDA classifiers with the newly 
proposed overlapped partitioning method were evaluated on 
our original P300-based BCI data set and the BCI competition 
III data set II. In the comparison, the classifiers were trained on 
limited training data (900) and large training data. The ensemble 
LDA classifier with traditional naive partitioning and the single 
classifier were also evaluated. One of three conditions for 
dimension reduction (stepwise, PCA, or none ) was applied. As 
a result, the ensemble LDA classifier with overlapped partitioning 
and the stepwise method (OSWLDA) showed higher accuracy 
than the commonly used single SWLDA classifier and the 
ensemble SWLDA classifier when 900 training data were 
available. In addition, the ensemble LDA classifiers with naive 
partitioning showed the worst performance for most conditions. 
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Table 11. Classification accuracies (%) of OLDA on data set B with full training data. 
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Table 11. Cont. 
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The best mean accuracy among all N/, for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble 
classifier with naive partitioning when N c = \7 and N/, = \. The classifier is equivalent to a single classifier when N c = \7 and Nb = 17. 
doi:1 0.1 371 /journal.pone.0093045.t01 1 



We suggest to use the stepwise method as a dimension reduction 
for the online implementation. In future research, the LDA with 
shrinkage or Bayesian LDA will be applied to the ensemble 
classifier with overlapped partitioning. 
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